A Survey of Optimistic Planning in Markov Decision Processes

نویسندگان

  • Lucian Buşoniu
  • Rémi Munos
  • Robert Babuška
چکیده

We review a class of online planning algorithms for deterministic and stochastic optimal control problems, modeled as Markov decision processes. At each discrete time step, these algorithmsmaximize the predicted value of planning policies from the current state, and apply the first action of the best policy found. An overall recedinghorizon algorithm results, which can also be seen as a type ofmodel-predictive control. The space of planning policies is explored optimistically, focusing on areas with largest upper bounds on the value – or upper confidence bounds, in the stochastic case. The resulting optimistic planning framework integrates several types of optimism previously used in planning, optimization, and reinforcement learning, in order to obtain several intuitive algorithms with good performance guarantees. We describe in detail three recent such algorithms, outline the theoretical guarantees on their performance, and illustrate their behavior in a numerical example. Work performed in part while L. Buşoniu was with Team SequeL, INRIA Lille. He is also associated with the Automation Department, Technical University of Cluj-Napoca, Romania. A Survey of Optimistic Planning in Markov Decision Processes. By Buşoniu, Munos, and Babuška Copyright c © 2012 John Wiley & Sons, Inc. 1 2 A SURVEY OF OPTIMISTIC PLANNING IN MARKOV DECISION PROCESSES

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimistic Planning in Markov Decision Processes

We review a class of online planning algorithms for deterministic and stochastic optimal control problems, modeled as Markov decision processes. At each discrete time step, these algorithms maximize the predicted value of planning policies from the current state, and apply the first action of the best policy found. An overall recedinghorizon algorithm results, which can also be seen as a type o...

متن کامل

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

Aggregating Optimistic Planning Trees for Solving Markov Decision Processes

This paper addresses the problem of online planning in Markov decision processes using a randomized simulator, under a budget constraint. We propose a new algorithm which is based on the construction of a forest of planning trees, where each tree corresponds to a random realization of the stochastic environment. The trees are constructed using a “safe” optimistic planning strategy combining the...

متن کامل

Optimistic planning for Markov decision processes

The reinforcement learning community has recently intensified its interest in online planning methods, due to their relative independence on the state space size. However, tight near-optimality guarantees are not yet available for the general case of stochastic Markov decision processes and closed-loop, state-dependent planning policies. We therefore consider an algorithm related to AO* that op...

متن کامل

Sample-Based Planning for Continuous Action Markov Decision Processes

In this paper, we present a new algorithm that integrates recent advances in solving continuous bandit problems with sample-based rollout methods for planning in Markov Decision Processes (MDPs). Our algorithm, Hierarchical Optimistic Optimization applied to Trees (HOOT) addresses planning in continuous action MDPs, directing the exploration of the search tree using insights from recent bandit ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012